geocentric models
Workspace setup:
Aka, normal distributions.
These distributions are unimodal and symmetric. They tend to naturally occur. And they tend to be consistent with our assumptions. (e.g., measures are continuous values on a real number line, centered around a specific value).
Later in the course, we’ll move away from Gaussian distributions, but they’re a useful place to start.
Here’s an example:
\[\begin{align*} y_i &\sim \text{Normal}(\mu_i,\sigma) \\ \mu_i &= \beta x_i \\ \beta &\sim \text{Normal}(0,10) \\ \sigma &\sim \text{Exponential}(1) \\ x_i &\sim \text{Normal}(0,1) \\ \end{align*}\]
Here’s the model for last week’s globe-tossing experiment:
\[\begin{align*} W &\sim \text{Binomial}(N,p) \\ p &\sim \text{Uniform}(0,1) \\ \end{align*}\]
The whole model can be read as:
The count \(W\) is distributed binomially with sample size \(N\) and probability \(p\). The prior for \(p\) is assumed to be uniform between 0 and 1.
Here’s the model for last week’s globe-tossing experiment:
\[\begin{align*} W &\sim \text{Binomial}(N,p) \\ p &\sim \text{Uniform}(0,1) \\ \end{align*}\]
We can use grid approximation to estimate the posterior distribution.
Simulating from your priors – prior predictive simulation – is an essential part of modeling. This allows you to see what your choices imply about the data. You’ll be able to diagnose bad choices.
At this point in the course, I’m going to start throwing a lot of code at you. Do I expect you to memorize this code? Of course not.
Do you need to understand every single thing that’s happening in the code? Nope.
But, you’ll learn a lot by taking the time to figure out what’s happening in a code chunk. Class time will frequently include exercises where I ask you to adapt code I’ve shared in the slides to a new dataset or to answer a new problem. When doing so, go back through the old code and figure out what’s going on. Run the code one line at a time. Always observe the output and take some time to look at the object that was created or modified. Here are some functions that will be extremely useful:
For the model defined below, simulate observed \(y\) values from the prior:
\[\begin{align*} y_i &\sim \text{Normal}(\mu, \sigma) \\ \mu &\sim \text{Normal}(0, 10) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]
A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
Remember that every student got taller each year. Does this change your choice of priors? How?
The variance in heights for students of the same age is never more than 64 cm. Does this lead you to revise your priors?
Using the Howell data (make sure you have the rethinking package loaded).
'data.frame': 544 obs. of 4 variables:
$ height: num 152 140 137 157 145 ...
$ weight: num 47.8 36.5 31.9 53 41.3 ...
$ age : num 63 63 65 41 51 35 32 27 19 54 ...
$ male : int 1 0 0 1 0 1 0 1 0 1 ...
mean sd 5.5% 94.5% histogram
height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁
weight 35.6106176 14.7191782 9.360721 54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁
age 29.3443934 20.7468882 1.000000 66.13500 ▇▅▅▃▅▂▂▁▁
male 0.4724265 0.4996986 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▇
Write a mathematical model for the weights in this data set. (Don’t predict from other variables yet.) Note that weight is presented in kilograms.
Simulate both your priors and the expected observed weight values from the prior.
\[\begin{align*} h &\sim \text{Normal}(\mu, \sigma) \\ \mu &\sim \text{Normal}(59, 20) \\ \sigma &\sim \text{Uniform}(0, 50) \\ \end{align*}\]
Simulate your priors
nsims <- 1e4 # number of simulations
set.seed(128) # reproducibility
sim_mu <- rnorm( nsims, 59, 20) # simulate values of mu
sim_sig <- runif(nsims, 0, 50) # simulate values of sigma
par(mfrow = c(1,3)) # plot display has 1 row, 3 columns
dens_mu <- dens(sim_mu) # density of mu
dens_sig <- dens(sim_sig) # density of sigma
dens_both <- plot(sim_mu, sim_sig, cex = .5, pch = 16,
col=col.alpha("#1c5253",0.1) ) # both togetherSimulate values of weight.
Use grid approximation to calculate posterior distribution.
(Not necessary to copy, just for teaching purposes.)
# values of mu and sigma to test
mu.list <- seq( from=25, to=74 , length.out=200 )
sigma.list <- seq( from=2, to=10 , length.out=200 )
# fit every possible combination of m and s
post <- expand.grid( mu=mu.list , sigma=sigma.list )
# calculate log-likelihood of weights for each combination of m and s
post$LL <- sapply( 1:nrow(post) , function(i) sum(
dnorm( d2$weight , post$mu[i] , post$sigma[i] , log=TRUE ) ) )
# add priors
post$prod <- post$LL +
dnorm( post$mu , 59 , 20 , TRUE ) +
dunif( post$sigma , 0 , 50 , TRUE )
# convert from LL to p
post$prob <- exp( post$prod - max(post$prod) )Go back to your code and change the range of values you estimate.
Rerun all the earlier code.
Cool, but we said last week that grid approximation is unwieldy and going to quickly become unmanageable. So let’s repeat this process with quadratic approximation.
We won’t be calculating the probability or likelihood of values directly (too costly), but we can make some assumptions about the shapes of distributions and get an approximation of the shape of the posterior.
flist <- alist(
weight ~ dnorm( mu , sigma ) ,
mu ~ dnorm( 59 , 20 ) ,
sigma ~ dunif( 0 , 50 )
)
m4.1 <- quap( flist , data=d2 )
precis( m4.1 ) mean sd 5.5% 94.5%
mu 44.994621 0.3436040 44.44548 45.543767
sigma 6.447531 0.2430006 6.05917 6.835893
These numbers provide Gaussian approximations for each parameter’s marginal distribution. This means the plausibility of each value of \(\mu\), after averaging over the plausibilities of each value of \(\sigma\), is given by a Gaussian distribution with mean 154.6 and standard deviation 0.4.
quap() has approximated a multivariate Gaussian distribution – more than one parameter, and these parameters may be related.
You can extract samples.
mu sigma
1 45.30229 6.606653
2 44.42090 6.615017
3 45.16411 6.590588
4 44.46886 6.504117
5 45.33543 6.182642
6 45.27712 6.116216
mean sd 5.5% 94.5% histogram
mu 44.988474 0.3463769 44.427906 45.538055 ▁▁▁▁▂▅▇▇▅▂▁▁▁▁
sigma 6.449303 0.2458303 6.054812 6.842653 ▁▁▁▂▇▇▅▂▁▁▁
We might assume that height and weight are associated with each other. Indeed, within our sample:
Update your mathematical model to incorporate height. Simulate from your priors to see the implied regression lines.
\[\begin{align*} h_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta (x_i - \bar{x}) \\ \alpha &\sim \text{Normal}(59, 20) \\ \beta &\sim \text{Normal}(0, 10) \\ \sigma &\sim \text{Uniform}(0, 50) \\ \end{align*}\]
To simulate from our priors:
# simulate 100 lines
nsims <- 100
sim_alpha = rnorm(nsims, 59, 20)
sim_beta = rnorm(nsims, 0, 10)
# calculate height
xbar = mean(d2$height)
# plot with nothing in it
plot(NULL, xlim = range(d2$height), ylim = c(-100, 400),
xlab = "height", ylab = "weight")
abline(h = 0, lty = 2) #line at 0
#plot each line
for(i in 1:nsims){
curve(sim_alpha[i] +sim_beta[i]*(x-xbar),
add = T,
col=col.alpha("#1c5253",0.4))
}Describe in words what’s wrong with our priors.
Slope should not be negative. How can we fix this?
Could use a uniform distribution bounded by 0.
# simulate 100 lines
nsims <- 100
sim_alpha = rnorm(nsims, 59, 20)
sim_beta = runif(nsims, 0, 10)
# calculate height
xbar = mean(d2$height)
# plot with nothing in it
plot(NULL, xlim = range(d2$height), ylim = c(-100, 400),
xlab = "height", ylab = "weight")
abline(h = 0, lty = 2) #line at 0
#plot each line
for(i in 1:nsims){
curve(sim_alpha[i] +sim_beta[i]*(x-xbar),
add = T,
col=col.alpha("#1c5253",0.4))
}Fit the new weight model using the quadratic approximation.
flist <- alist(
weight ~ dnorm( mu , sigma ) ,
mu <- a + b*(height - mean(height)),
a ~ dnorm( 59 , 20 ) ,
b ~ dunif(0, 10),
sigma ~ dunif( 0 , 50 )
)
m2 <- quap( flist , data=d2 )
precis( m2 ) mean sd 5.5% 94.5%
a 44.9922582 0.22542884 44.6319794 45.3525370
b 0.6294331 0.02915969 0.5828303 0.6760359
sigma 4.2296880 0.15941299 3.9749152 4.4844607
Draw lines from the posterior distribution and plot with the data.
A side note: a major concern or critique of Bayesian analysis is that the subjectivity of the priors allow for nefarious behavior. “Putting our thumbs on the scale,” so to speak. But priors are quickly overwhelmed by data. Case in point:
flist <- alist(
weight ~ dnorm( mu , sigma ) ,
mu <- a + b*(height - mean(height)),
a ~ dnorm( 59 , 20 ) ,
b ~ dnorm(-5, 1),
sigma ~ dunif( 0 , 50 )
)
m2 <- quap( flist , data=d2 )
precis( m2 ) mean sd 5.5% 94.5%
a 44.992126 0.22542830 44.6318478 45.3524038
b 0.624428 0.02914966 0.5778413 0.6710148
sigma 4.229678 0.15941534 3.9749013 4.4844543